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(T) ' Abstract 

o 
o 

There exists a considerable amount of research claiming a puzzling anti-correlation 
between the neutrino detection rate at the Homestake experiment and indicators of 
Oh' solar activity such as the sunspot number, giving rise to explanations involving the hy- 

6 

pothesis of a neutrino magnetic moment. It is argued here that the claimed significant 

CO 

anti-correlation is due to a statistical fallacy. A proper test based on certain optimality 
■ criteria fails to detect a significant time variation of the neutrino flux in concert with 

the sunspot number, providing evidence that the observations are consistent with no 
correlation between the two series. 
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Solar neutrinos are the only known particles to reach Earth directly from the solar 
core and thus allow to test directly the theories of stellar evolution and nuclear energy 
generation Q. A perceived anti-correlation between the neutrino detection rate at the 
Homestake experiment || and indicators of solar activity such as the sunspot number has 
been the object of a considerable amount of research P-[l2|, yielding claims of statistically 
highly significant results. Such time variations of the solar neutrino flux are not possible 
in minimal standard electroweak theory and have motivated proposals for solutions of the 
solar neutrino problem based upon the hypothesis of a large neutrino magnetic moment 




However, the standard tests for correlation used in the research cited above require 
assumptions that are usually not met in a time-series context, where these tests may 
readily produce erroneous, highly significant results. Figure 1 illustrates one aspect of 
this fallacy, which is often ignored by statistics text books and therefore easily goes 
unrecognized in scientific work: The top scatterplot shows the first 100 of 109 typ- 
ical independent observations (X±, Yi), . . . , (-X109, ^109) from a standard bivariate nor- 
mal distribution. The bottom scatterplot shows the 100 running means of length 10, 
(m ^i=k X i> To Ef=fe Yi), k = 1, ■ ■ • , 100. The correlation is visibly larger in the bottom 
plot. Indeed, Pearson's correlation coefficient r is 0.12 for the top plot, and 0.30 for the 
bottom plot. However, the probability of obtaining values of \r\ of at least the observed 
size is larger for the situation of the bottom plot (27%) than for that of the top plot (24%), 
as can be verified by simulations! This example illustrates the fact that common tests for 
correlation between two series tend to give erroneous, highly significant results when there 
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is dependence within each of the two series, e.g. when the series exhibit periodic behavior 
or are smoothed, a commonly employed procedure either implicitly in the data collection 
process or afterwards. 

The sunspot numbers clearly have a strong dependence structure due to the 11 year 
period of the sunspot cycle. Table 1 shows how easily one is lead to an erroneous claim of 
a significant correlation between the sunspot numbers and an independent random series, 
this time using Spearman's rank correlation coefficient r s , another popular measure of 
correlation: X is taken to be the series of the 100 monthly sunspot numbers starting 
January 1970. Y is a random walk with independent Gaussian increments in row 1, and 
a 2-point and 4-point running mean of independent standard Gaussian random variables 
in rows 2 and 3, respectively. Y was simulated 10 5 times for each case, and the columns 
give the relative frequency of rejection of the null hypothesis of independence at nominal 
significance levels 5%, 1% and 0.1%, using the null distribution of r s given in 17]. For 



example, at the 1% significance level one is led to the conclusion that there is a correlation 
with the random walk about 77% of the time! A comparison of rows 2 and 3 shows that 
a larger degree of smoothing applied to one series makes the correlation seemingly more 
significant, a fact that will be of importance below. The effect described above is relevant 
quite generally for many tests of association or correlation, such as the x 2 statistic for 
contingency tables, or Kendall's tau statistic. It applies directly to those published results 
on a perceived correlation between the sunspot number and the neutrino flux that employ 
a smoothing of the neutrino flux. 

Furthermore, the assumptions of these tests can also be violated in other important 
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ways. For example, tests for correlation using r s or Kendall's tau require that the distri- 
bution of the components of at least one the two series is invariant under permutations, 
which implies equal means and variances of the measurements in that series. Row 4 of 
table 1 provides an important example that violates this requirement: The neutrino flux is 
taken to be constant equal to 1 for a random time which is distributed exponentially with 
mean 10 months, then the flux equals 3 for a random time with the same distribution, then 
it is set back to 1, etc. The flux is measured independently each month with a standard 
Gaussian measurement error. Incidentally, a typical simulation of this model looks even 
similar to the real neutrino data. Simulations of the flux from this model are uncorrelated 
with the sunspot number and have no connection to the solar cycle whatsoever. (There is 
nothing special about the exponential distribution chosen: virtually any random or deter- 
ministic time will produce similar results to the ones quoted in the following). Still, row 4 
shows that r s erroneously reports a correlation at the 1% level for 22.4% of the simula- 
tions. Clearly, this test misinterprets a change in the neutrino flux that is unrelated to the 
solar cycle as a correlation with the solar cycle. One can reproduce this effect with the all 
the tests employed in |2-12|. This example makes clear that in this time series context, 



it is not correct to interpret significant results of these tests as significant evidence for a 
correlation with the solar cycle, even if no smoothing of the neutrino data is employed. 

One may ask whether these tests are at least providing evidence for a time variation, 
not necessarily in concert with the solar cycle. However, due to the unequal uncertain- 
ties in the neutrino measurements, these tests are also not valid for testing whether the 
flux is constant: For an illustration, let x = (3,2,1,5) be a vector of four observations, 
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and Y\, . . . , Y4 be four independent Gaussian random variables with mean and standard 
deviations 1,1,1 and 4. The Vs represent observations of a constant quantity with mea- 
surement error. In 7.0% of 10 5 simulations of the Y's, the correlation r s between x and Y 
was equal to 1, whereas the table for the exact null distribution of r s gives a value of 4.17% 
(see e.g. Table VIII in Esf). Similar results obtain when the significance of the x 2 - and 
-F-statistics is evaluated by randomly shuffling the data (2[|J|, as the distributions of these 
statistics are not invariant under those permutations: The best correlation (smallest value 
of x 2 , resp. largest value of F) is obtained by exactly one of the 4! = 24 permutations of 
the data, yielding a significance level of 1/24 = 4.17%. However, this best correlation was 
obtained in 10.3% of the simulations. While this effect seems to become less severe with 
more data or more equal uncertainties, the example shows that these tests lack proper 
justification and can produce invalid results. More importantly, when a modified test is 
used that accounts for the uncertainties in a proper way, then the highly significant results 
reported for the neutrino data disappear: 

The neutrino data that shall first be examined are the 108 estimates iV, of the neu- 
trino flux provided by the Homestake experiment || up to run no. 133, so that N{ = 
flux(ij) + <7jej, i = 1, . . . , 108. Here flux(i) denotes the neutrino flux at time t, which 
is possibly time-varying. The uncertainties Oi given by the Homestake experiment have 
recently been reanalyzed by the Homestake team, resulting in improved uncertainties that 
have generously been made available by Dr. Kenneth Lande (private communication). 
The standardized measurement errors for the various runs are independent by the de- 
sign of the experiment. A test for correlation can now be developed by examining how 
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linear functions a + b-S{ of the monthly sunspots numbers Sj explain flux(tj), i.e. using re- 
gression techniques. Under the null hypothesis of a constant neutrino flux, flux(i) = a, the 
distribution of the scaled differences di = (Ni — a) /ai, i = 1 . . . 108, is invariant under per- 
mutations, which justifies the validity of a permutation test for the statistic T = J2j=i s idi- 
This statistic is sensitive to trends in flux(i) that vary in concert with the Sj, and pos- 



sesses certain optimality properties for this type of problem [19]. a was estimated by the 
standard estimate (Yli=i Ni/af)/ Y^l=i ■ The (improved) uncertainties provided by the 
Homestake experiment were used in the same way as in Q, i.e. the test was done using 
both 'average errors' and 'upper errors' for the <7j. Using 10 4 random permutations, the 
test resulted in a two-tailed significance probability of 16.3% for average errors, and 10.4% 
for upper errors. 

As pointed out by a referee, it is informative to evaluate T for earlier stretches of 
the data, where highly significant correlations have been reported: One obtains only 
marginally significant results (significance around 2%) for the data up to run no. 108. The 
same significance obtains for the stretch from run no. 49 to run no. 104, after the result is 
adjusted by a factor of 10 due to favorable 'fishing' for a significant stretch as suggested 
in §. 

A summary of these results also allows to put together to a coherent picture the 
sometimes conflicting evidence reported in J2-12]: The data up to run no. 133 are clearly 



consistent with a constant neutrino flux when tested against the alternative of a time vari- 
ation in concert with the solar cycle, according to a test with certain optimality properties 
for this problem. The previously reported highly significant results in earlier stretches of 
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the data cannot be reproduced when the uncertainties and the permutation argument are 
employed correctly. Only marginal evidence for a time variation is found in these stretches. 
In any case, it would not be correct to interpret these results as evidence for a correlation 
with the solar cycle. This allows to reconcile these findings with the periodogram analysis 
in Q, which shows no significant 11 yr component in the data. The reported improved 
correlation with smoother functions of the sunspot numbers (3|,[7,1C, 12 is not surprising 
in light of the artifact exhibited in the third paragraph. 

I wish to thank Raymond Davis and Kenneth Lande for kindly making the Homestake 
data available, Peter Sturrock for valuable discussions, and a referee for criticism that 
helped improve the paper. This work was supported by the Air Force Office of Scientific 
Research and by NASA. 
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Figure 1: Top: Typical scatterplot of 100 independent standard bivariate normal obser- 
vations. Bottom: Running means of length 10. Pearson's correlation coefficient is 0.12 for 
the top plot and 0.30 for the bottom plot. 
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Time series 


Relative 
at non 

5% 


> frequency o: 
linal significa 
1% 


" rejection 
nee level 
0.1% 


1 


X = sunspot numbers 

Y k = YJt=iZi, fc = l,...,100 


82.7% 


77.3% 


70.7% 


2 


X = sunspot numbers 

Y k = E-4 1 Zi, k = 1,... , 100 


15.3% 


6.1% 


1.7% 


3 


X = sunspot numbers 

Y k = Efi 3 Zi, k = 1, . . . , 100 


30.0% 


17.5% 


8.4% 


4 


X = sunspot numbers 

Y k = i + z k ifr 2i <fc<r 2i+ i, 

Y k = 3 + Z k else; k = 1, . . . , 100 


35.7% 


22.4% 


11.7% 



Table 1: Relative frequencies of rejection of the null hypothesis of independence at various 
nominal significance levels in a Monte Carlo study using the nominal null distribution of 
Spearman's correlation coefficient. X is the series of the 100 monthly sunspot numbers 
starting in January 1970. The Z{ are independent standard normal random variables. Tj 
is the sum of the first i terms of a sequence of independent exponential random variables 
with mean 10 months. 
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